Goto

Collaborating Authors

 critical parameter



FedPURIN: Programmed Update and Reduced INformation for Sparse Personalized Federated Learning

Xie, Lunchen, He, Zehua, Shi, Qingjiang

arXiv.org Artificial Intelligence

Personalized Federated Learning (PFL) has emerged as a critical research frontier addressing data heterogeneity issue across distributed clients. Novel model architectures and collaboration mechanisms are engineered to accommodate statistical disparities while producing client-specific models. Parameter decoupling represents a promising paradigm for maintaining model performance in PFL frameworks. However, the communication efficiency of many existing methods remains suboptimal, sustaining substantial communication burdens that impede practical deployment. To bridge this gap, we propose Federated Learning with Programmed Update and Reduced INformation (FedPURIN), a novel framework that strategically identifies critical parameters for transmission through an integer programming formulation. This mathematically grounded strategy is seamlessly integrated into a sparse aggregation scheme, achieving a significant communication reduction while preserving the efficacy. Comprehensive evaluations on standard image classification benchmarks under varied non-IID conditions demonstrate competitive performance relative to state-of-the-art methods, coupled with quantifiable communication reduction through sparse aggregation. The framework establishes a new paradigm for communication-efficient PFL, particularly advantageous for edge intelligence systems operating with heterogeneous data sources. Introduction Federated learning (FL), as a powerful distributed machine learning scheme, has been well studied to handle the growing trend towards harnessing abundant data on ubiquitous edge devices [1]. This framework has been successfully applied in various domains, including computer vision [2, 3], healthcare [4, 5], finance [6, 7], and ubiquitous IoT applications [8, 9, 10].


DataStealing: Steal Data from Diffusion Models in Federated Learning with Multiple Trojans

Neural Information Processing Systems

Parameters (AdaSCP) attack to circumvent the defenses and seamlessly incorporate malicious updates into the global model. Specifically, AdaSCP evaluates the importance of parameters with the gradients in dominant timesteps of the diffusion model. Subsequently, it adaptively seeks the optimal scale factor and magnifies critical parameter updates before uploading to the server. As a result, the malicious update becomes similar to the benign update, making it difficult for distance-based defenses to identify. Extensive experiments reveal the risk of leaking thousands of images in training diffusion models with FL.


Optimizing Force Signals from Human Demonstrations of In-Contact Motions

Hartwig, Johannes, Viessmann, Fabian, Henrich, Dominik

arXiv.org Artificial Intelligence

For non-robot-programming experts, kinesthetic guiding can be an intuitive input method, as robot programming of in-contact tasks is becoming more prominent. However, imprecise and noisy input signals from human demonstrations pose problems when reproducing motions directly or using the signal as input for machine learning methods. This paper explores optimizing force signals to correspond better to the human intention of the demonstrated signal. We compare different signal filtering methods and propose a peak detection method for dealing with first-contact deviations in the signal. The evaluation of these methods considers a specialized error criterion between the input and the human-intended signal. In addition, we analyze the critical parameters' influence on the filtering methods. The quality for an individual motion could be increased by up to \SI{20}{\percent} concerning the error criterion. The proposed contribution can improve the usability of robot programming and the interaction between humans and robots.

  Country: Europe > Germany > Bavaria > Upper Franconia > Bayreuth (0.05)
  Genre: Research Report (0.50)

MaZO: Masked Zeroth-Order Optimization for Multi-Task Fine-Tuning of Large Language Models

Zhang, Zhen, Yang, Yifan, Zhen, Kai, Susanj, Nathan, Mouchtaris, Athanasios, Kunzmann, Siegfried, Zhang, Zheng

arXiv.org Artificial Intelligence

Large language models have demonstrated exceptional capabilities across diverse tasks, but their fine-tuning demands significant memory, posing challenges for resource-constrained environments. Zeroth-order (ZO) optimization provides a memory-efficient alternative by eliminating the need for backpropagation. However, ZO optimization suffers from high gradient variance, and prior research has largely focused on single-task learning, leaving its application to multi-task learning unexplored. Multi-task learning is crucial for leveraging shared knowledge across tasks to improve generalization, yet it introduces unique challenges under ZO settings, such as amplified gradient variance and collinearity. In this paper, we present MaZO, the first framework specifically designed for multi-task LLM fine-tuning under ZO optimization. MaZO tackles these challenges at the parameter level through two key innovations: a weight importance metric to identify critical parameters and a multi-task weight update mask to selectively update these parameters, reducing the dimensionality of the parameter space and mitigating task conflicts. Experiments demonstrate that MaZO achieves state-of-the-art performance, surpassing even multi-task learning methods designed for first-order optimization.


No Data, No Optimization: A Lightweight Method To Disrupt Neural Networks With Sign-Flips

Galil, Ido, Kimhi, Moshe, El-Yaniv, Ran

arXiv.org Artificial Intelligence

Deep neural networks (DNNs) power a wide range of applications, including safety-critical tasks such as autonomous driving, unmanned aerial vehicle (UAV) navigation, medical diagnostics, and robotics, where real-time decision-making is essential. However, the increasing reliance on DNNs also raises concerns about their resilience to malicious attacks. Ensuring the robustness of DNNs is crucial to maintaining their reliability in such critical applications. In this paper, we expose a critical vulnerability in DNNs that allows for severe disruption by flipping as few as one to ten sign bits, a tiny fraction of the model's parameters. Our method demonstrates how a small number of bit flips, within models containing up to hundred millions of parameters, can cause catastrophic degradation in performance. We systematically analyze and identify the parameters most susceptible to sign flips, which we term "critical parameters."


AttentionBreaker: Adaptive Evolutionary Optimization for Unmasking Vulnerabilities in LLMs through Bit-Flip Attacks

Das, Sanjay, Bhattacharya, Swastik, Kundu, Souvik, Kundu, Shamik, Menon, Anand, Raha, Arnab, Basu, Kanad

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have revolutionized natural language processing (NLP), excelling in tasks like text generation and summarization. However, their increasing adoption in mission-critical applications raises concerns about hardware-based threats, particularly bit-flip attacks (BFAs). BFAs, enabled by fault injection methods such as Rowhammer, target model parameters in memory, compromising both integrity and performance. Identifying critical parameters for BFAs in the vast parameter space of LLMs poses significant challenges. While prior research suggests transformer-based architectures are inherently more robust to BFAs compared to traditional deep neural networks, we challenge this assumption. For the first time, we demonstrate that as few as three bit-flips can cause catastrophic performance degradation in an LLM with billions of parameters. Current BFA techniques are inadequate for exploiting this vulnerability due to the difficulty of efficiently identifying critical parameters within the immense parameter space. To address this, we propose AttentionBreaker, a novel framework tailored for LLMs that enables efficient traversal of the parameter space to identify critical parameters. Additionally, we introduce GenBFA, an evolutionary optimization strategy designed to refine the search further, isolating the most critical bits for an efficient and effective attack. Empirical results reveal the profound vulnerability of LLMs to AttentionBreaker. For example, merely three bit-flips (4.129 x 10^-9% of total parameters) in the LLaMA3-8B-Instruct 8-bit quantized (W8) model result in a complete performance collapse: accuracy on MMLU tasks drops from 67.3% to 0%, and Wikitext perplexity skyrockets from 12.6 to 4.72 x 10^5. These findings underscore the effectiveness of AttentionBreaker in uncovering and exploiting critical vulnerabilities within LLM architectures.


Choosing the parameter of the Fermat distance: navigating geometry and noise

Chazal, Frédéric, Ferraris, Laure, Groisman, Pablo, Jonckheere, Matthieu, Pascal, Frédéric, Sapienza, Facundo

arXiv.org Machine Learning

The Fermat distance has been recently established as a useful tool for machine learning tasks when a natural distance is not directly available to the practitioner or to improve the results given by Euclidean distances by exploding the geometrical and statistical properties of the dataset. This distance depends on a parameter $\alpha$ that greatly impacts the performance of subsequent tasks. Ideally, the value of $\alpha$ should be large enough to navigate the geometric intricacies inherent to the problem. At the same, it should remain restrained enough to sidestep any deleterious ramifications stemming from noise during the process of distance estimation. We study both theoretically and through simulations how to select this parameter.


Safety Evaluation of Robot Systems via Uncertainty Quantification

Baek, Woo-Jeong, Kröger, Torsten

arXiv.org Artificial Intelligence

In this paper, we present an approach for quantifying the propagated uncertainty of robot systems in an online and data-driven manner. Especially in Human-Robot Collaboration, keeping track of the safety compliance during run time is essential: Misclassifying dangerous situations as safe might result in severe accidents. According to official regulations (eg, ISO standards), safety in industrial robot applications depends on critical parameters, such as the distance and relative velocity between humans and robots. However, safety can only be assured given a measure for the reliability of these parameters. While different risk detection and mitigation approaches exist in literature, a measure that can be used to evaluate safety limits online, and succinctly implies whether a situation is safe or dangerous, is missing to date. Motivated by this, we introduce a generalizable method for calculating the propagated measurement uncertainty of arbitrary parameters, that captures the accumulated uncertainty originating from sensory devices and environmental disturbances of the system. To show that our approach delivers correct results, we perform validation experiments in simulation. In addition, we employ our method in two real-world settings and demonstrate how quantifying the propagated uncertainty of critical parameters facilitates assessing safety online in Human-Robot Collaboration.


ECSAS: Exploring Critical Scenarios from Action Sequence in Autonomous Driving

Kang, Shuting, Guo, Heng, Zhang, Lijun, Liu, Guangzhen, Xue, Yunzhi, Wu, Yanjun

arXiv.org Artificial Intelligence

Critical scenario generation requires the ability of sampling critical combinations from the infinite parameter space in the logic scenario. Existing solutions aim to explore the correlation of action parameters in the initial scenario rather than action sequences. How to model action sequences so that one can further consider the effects of different action parameters in the scenario is the bottleneck of the problem. In this paper, we attack the problem by proposing the ECSAS framework. Specifically, we first propose a description language, BTScenario, allowing us to model action sequences of the scenarios. We then use reinforcement learning to search for combinations of critical action parameters. To increase efficiency, we further propose several optimizations, including action masking and replay buffer. We have implemented ECSAS, and experimental results show that it is more efficient than native approaches such as random and combination testing in various nontrivial scenarios.